ISP: Large-Scale In-memory Spatial Data Processing System (Demo Paper)

نویسندگان

  • Simin You
  • Jianting Zhang
چکیده

Huge amount of spatial data such as GPS locations is being generated everyday, which brings big challenges of efficient spatial data processing. Many existing big spatial data processing techniques are mostly based on disk-resident systems. They have not fully taken advantages of modern hardware, such as large main memory capacities and multi-core processors. In this paper, we demonstrate our ISP system for in-memory processing of large-scale spatial data in distributed multi-core computing nodes. ISP is built on top of the open source Impala system, a leading Massively Parallel Processing (MPP) SQL engine, with two signficant extensions. First, while Impala is designed to process relational data and does not support spatial queries, ISP supports spatial SQL query syntax at the front end and is able to process the spatial queries at the back end. Second, while Impala currently supports neither indexed joins nor parallel joins on multi-core machines for non-equality joins, ISP provides on-the-fly parallel spatial indexing and query processing modules. We have performed experiments for a case study of point-in-polygon test based spatial joins. Using real data for a point-in-polygon based spatial join, experiments have shown that ISP on a two-node minicluster is 6.4X time faster than PostgreSQL/PostGIS. ISP is also 10X faster than Hadoop-GIS, a big spatial data processing solution built on top of Hadoop/Hive, on a 10-node Amazon EC2 cloud cluster. With proper setting of parameters of distributed systems, ISP also scales well. We will demonstrate ISP using both an EC2 cloud cluster and an in-house small cluster to conference participants.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Architecture of a High-Performance Image Processing System

This paper describes the architecture of a high performance, compact image processing system. The system feature is that an image processor is constructed by employing eight kinds of high speed VLSIs, including real-time video image processing LSI(1SP-11). These VLSIs are developed while realizing both compactness and easy system extensions. The ISP-I1 is a VLSI for gray scale image processing....

متن کامل

Study of Numerical Processing Speed, Implicit and Explicit Memory, Active and Passive Memory, Conservation Abilities, and Visual-Spatial Skills of Students with Dyscalculia

Background and Purpose: Learning disorder is one of the common disorders in students, which can lead to the occurrence of educational problems and secondary disorders in them. Based on psychopathological criteria, dyscalculia is one of the subcategories of learning disorder. Children with this disorder have problems in perception of spatial relations and in different cognitive abilities. Theref...

متن کامل

بررسی کنش‌های شناختی دانش‌آموزان دارای لکنت

Objective Stuttering is one of the most common speech disorders that generate many complications in children and adults. This disorder involves behavioral, cognitive and emotional interactions. So, the purpose of the current study is to investigate the cognitive functions of students with stuttering. Materials & Methods A descriptive study, comprising of 30 students (8 females and 22 males) fr...

متن کامل

The effect of injection of estradiol benzoate into hippocampal CA1 area on spatial learning and memory in intact and castrated adult male rats

Estrogen has a widespread and complex influence on brain capabilities such as learning and memory. On the other hand, hippocampus as one of the main brain structures has an important role in spatial information processing. There is some evidence on the existence of estrogen receptors in the hippocampal CA1 area. So, in this study the effect of intrahippocampal injection of estradiol benzoate on...

متن کامل

GeoSpark: A Cluster Computing Framework for Processing Spatial Data

This paper introduces GeoSpark an in-memory cluster computing framework for processing large-scale spatial data. GeoSpark consists of three layers: Apache Spark Layer, Spatial RDD Layer and Spatial Query Processing Layer. Apache Spark Layer provides basic Spark functionalities that include loading / storing data to disk as well as regular RDD operations. Spatial RDD Layer consists of three nove...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014